Boosting Optimal Logical Patterns Using Noisy Data

نویسندگان

Noam Goldberg

Chung-chieh Shan

چکیده

We consider the supervised learning of a binary classifier from noisy observations. We use smooth boosting to linearly combine abstaining hypotheses, each of which maps a subcube of the attribute space to one of the two classes. We introduce a new branch-and-bound weak learner to maximize the agreement rate of each hypothesis. Dobkin et al. give an algorithm for maximizing agreement with real-valued attributes [9]. Our algorithm improves on the time complexity of Dobkin et al.’s as long as the data can be binarized so that the number of binary attributes is o(log of the number of observations × number of real-valued attributes). Furthermore, we have fine-tuned our branch-and-bound algorithm with a queuing discipline and optimality gap to make it fast in practice. Finally, since logical patterns in Hammer et al.’s Logical Analysis of Data (LAD) framework [8, 6] are equivalent to abstaining monomial hypotheses, any boosting algorithm can be combined with our proposed weak learner to construct LAD models. On various data sets, our method outperforms state-ofthe-art methods that use suboptimal or heuristic weak learners, such as SLIPPER. It is competitive with other optimizing classifiers that combine monomials, such as LAD. Compared to LAD, our method eliminates many free parameters that restrict the hypothesis space and require extensive fine-tuning by cross-validation.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Some Theoretical Aspects of Boosting in the Presence of Noisy Data

This is a survey of some theoretical results on boosting obtained from an analogous treatment of some regression and classi cation boosting algorithms. Some related papers include [J99] and [J00a,b,c,d], which is a set of (mutually overlapping) papers concerning the assumption of weak hypotheses, behavior of generalization error in the large time limit and during the process of boosting, compar...

متن کامل

On Boosting and Noisy Labels

Boosting is a machine learning technique widely used across many disciplines. Boosting enables one to learn from labeled data in order to predict the labels of unlabeled data. A central property of boosting instrumental to its popularity is its resistance to overfitting. Previous experiments provide a margin-based explanation for this resistance to overfitting. In this thesis, the main finding ...

متن کامل

STRUCTURAL DAMAGE PROGNOSIS BY EVALUATING MODAL DATA ORTHOGONALITY USING CHAOTIC IMPERIALIST COMPETITIVE ALGORITHM

Presenting structural damage detection problem as an inverse model-updating approach is one of the well-known methods which can reach to informative features of damages. This paper proposes a model-based method for fault prognosis in engineering structures. A new damage-sensitive cost function is suggested by employing the main concepts of the Modal Assurance Criterion (MAC) on the first severa...

متن کامل

Identification of Cement Rotary Kiln in Noisy Condition using Takagi-Sugeno Neuro-fuzzy System

Cement rotary kiln is the main part of cement production process that have always attracted many researchers’ attention. But this complex nonlinear system has not been modeled efficiently which can make an appropriate performance specially in noisy condition. In this paper Takagi-Sugeno neuro-fuzzy system (TSNFS) is used for identification of cement rotary kiln, and gradient descent (GD) algori...

متن کامل

Combining Bagging and Boosting

Bagging and boosting are among the most popular resampling ensemble methods that generate and combine a diversity of classifiers using the same learning algorithm for the base-classifiers. Boosting algorithms are considered stronger than bagging on noisefree data. However, there are strong empirical indications that bagging is much more robust than boosting in noisy settings. For this reason, i...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2007

Boosting Optimal Logical Patterns Using Noisy Data

نویسندگان

چکیده

منابع مشابه

Some Theoretical Aspects of Boosting in the Presence of Noisy Data

On Boosting and Noisy Labels

STRUCTURAL DAMAGE PROGNOSIS BY EVALUATING MODAL DATA ORTHOGONALITY USING CHAOTIC IMPERIALIST COMPETITIVE ALGORITHM

Identification of Cement Rotary Kiln in Noisy Condition using Takagi-Sugeno Neuro-fuzzy System

Combining Bagging and Boosting

عنوان ژورنال:

اشتراک گذاری